Extend histogram bucket cap for HTTP and catalog metadata update timers by abhisheknath2011 · Pull Request #567 · linkedin/openhouse

abhisheknath2011 · 2026-05-01T05:02:06Z

Summary

The p99 latency alert for PUT /v1/databases/{databaseId}/tables/{tableId} on tables-service was saturating at exactly ~31s, regardless of how slow snapshot commits actually got. Investigation showed this is not a real timeout — it's a
Micrometer histogram artifact:

application.properties enables management.metrics.distribution.percentiles-histogram.all=true, so all timers publish histograms.
Micrometer's default Timer.maximumExpectedValue is 30s. The pre-computed PercentileHistogramBuckets table is truncated at this cap, with the highest finite bucket near ~30.5s (rounded to ~31s in dashboards).
Anything slower lands in the +Inf bucket and is invisible to histogram_quantile, so p99/max visually pin at 31s.
The existing override for catalog_metadata_retrieval_latency=600s proved this; the same pattern was missing for the HTTP server timer and the catalog update timer, both of which can legitimately exceed 30s during metastore contention.

Max latency is always capped at 31s although put snapshot takes higher:

Changes

Override Micrometer's default 30s maximumExpectedValue for http.server.requests and catalog_metadata_update_latency in tables-service, raising them to 600s.
Add tests covering both the property wiring and the resulting histogram bucket boundaries.

services/tables/src/main/resources/application.properties

Add maximum-expected-value.catalog_metadata_update_latency=600s
Add maximum-expected-value.http.server.requests=600s (Spring's auto-instrumented HTTP timer; exported to Prometheus as http_server_requests_seconds)

services/tables/src/test/java/.../MetricsHistogramConfigurationTest.java

Add two property-level assertions (testMaxExpectedValueConfigurationIsSetForUpdateLatency, testMaxExpectedValueConfigurationIsSetForHttpServerRequests).
Add two live-registry assertions verifying histogram buckets actually extend to ≥600s for the new timers.
Factor shared bucket-assertion logic into assertTimerHistogramExtendsTo600s(...).

Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

./gradlew :services:tables:test --tests "MetricsHistogramConfigurationTest" — all 9 tests pass (3 new + 6 existing).

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

cbb330

LGTM!

Extend histogram bucket cap for HTTP and catalog metadata update timers

8d43f08

cbb330 approved these changes May 1, 2026

View reviewed changes

cbb330 merged commit 55a7c64 into linkedin:main May 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend histogram bucket cap for HTTP and catalog metadata update timers#567

Extend histogram bucket cap for HTTP and catalog metadata update timers#567
cbb330 merged 1 commit intolinkedin:mainfrom
abhisheknath2011:latency-metric

abhisheknath2011 commented May 1, 2026

Uh oh!

cbb330 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abhisheknath2011 commented May 1, 2026

Summary

Changes

Changes

Testing Done

Additional Information

Uh oh!

cbb330 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants